Rough Set Extensions for Feature Selection
نویسنده
چکیده
Rough set theory (RST) was proposed as a mathematical tool to deal with the analysis of imprecise, uncertain or incomplete information or knowledge. It is of fundamental importance to artificial intelligence particularly in the areas of knowledge discovery, machine learning, decision support systems, and inductive reasoning. At the heart of RST is the idea of only employing the information contained within the data, thus unlike many other methods, probability distribution information or assignments are not required. RST relies on the concept of indiscernibility to group equivalent elements and generate knowledge granules. These granules are then used to build a structure to approximate a given concept. This framework has unsurprisingly proven successful for the application to the task of feature selection. Feature selection (FS) is a term given to the problem of selecting input attributes which are most predictive of a given outcome. Unlike other dimensionality reduction methods, feature selection algorithms preserve the original semantics of the features following reduction. This has been applied to tasks which involve datasets that contain huge numbers of features (in the order of tens of thousands), which would be impossible to process otherwise. Recent examples of such problems include text processing and web content classification. FS techniques have also been applied to small and medium-sized datasets in order to discover the most information-rich features. The application of rough sets for FS has resulted in many efficient algorithms. However, due to the granularity of the approximations generated by the rough set approach there is often a resulting level of uncertainty. This uncertainty in information is usually ignored for FS (by nature of the very fact that it is ‘uncertain’). In this thesis, a number of methods are proposed which attempt to use the uncertain information to improve the performance of rough sets and extensions thereof for the task of FS. These approaches are applied to two application domain problems where the reduction of features is of high importance; mammographic image analysis and complex systems monitoring. The utility of the approaches is demonstrated and compared empirically with several other dimensionality reduction techniques. In several experimental evaluation sections, the approaches are shown to equal or improve classification accuracy when compared to results obtained from unreduced data. Based on the new fuzzy-rough approaches, further developments and techniques are also presented in this thesis. The first of these is the application of a nearest neighbour classifier for the classification of real-valued data. This technique is evaluated within the mammographic imaging application. Also, a novel unsupervised feature selection approach is proposed which reduces features by eliminating those which are considered redundant. Both the fuzzy-rough classifier mentioned above, and UFRFS are employed and evaluated for the complex systems monitoring application.
منابع مشابه
A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملFuzzy-rough Information Gain Ratio Approach to Filter-wrapper Feature Selection
Feature selection for various applications has been carried out for many years in many different research areas. However, there is a trade-off between finding feature subsets with minimum length and increasing the classification accuracy. In this paper, a filter-wrapper feature selection approach based on fuzzy-rough gain ratio is proposed to tackle this problem. As a search strategy, a modifie...
متن کاملRough Sets as a Framework for Data Mining
The issues of Real World are: a) Very large data sets b) Mixed types of data (continuous valued, symbolic data) c) Uncertainty (noisy data) d) Incompleteness (missing, incomplete data) e) Data change f) Use of background knowledge The main goal of the rough set analysis is induction of approximations of concepts. Rough sets constitute a sound basis for KDD. It offers mathematical tools to disco...
متن کاملCombination of Feature Selection and Learning Methods for IoT Data Fusion
In this paper, we propose five data fusion schemes for the Internet of Things (IoT) scenario,which are Relief and Perceptron (Re-P), Relief and Genetic Algorithm Particle Swarm Optimization (Re-GAPSO), Genetic Algorithm and Artificial Neural Network (GA-ANN), Rough and Perceptron (Ro-P)and Rough and GAPSO (Ro-GAPSO). All the schemes consist of four stages, including preprocessingthe data set ba...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملApplication of Fuzzy-rough Set Theory for Feature Subset Selection
Fuzzy Set Theory and Rough Set Theory are the most popular mathematical tools for dealing with uncertainties. During past decades, these set theories are being applied successfully in several areas for solving many complex tasks. This paper is concerned with the application of hybrid Fuzzy-Rough set based approach for feature subset selection. Keywords— Fuzzy set theory, Rough Set theory, Fuzzy...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009